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ABSTRACT 

The distribution of dark matter in the halos that surround galaxies and galaxy clusters has been 
probed observationally, theoretically, and in numerical simulations. Yet there is still confusion about 
which of several suggested parameterized models is the better representation, and whether these 
models are truly universal. We use the observed temperature and density profiles of the intracluster 
medium of 11 relaxed galaxy clusters to investigate mass models for the dark matter halo using a 
thorough statistical analysis. We make careful comparisons between two- and three-parameter models, 
including the issue of a universal third parameter. We find that, of the two-parameter models, the 
NFW is still the best representation, but we also find moderate statistical evidence that a generalized 
three-parameter NFW model with a freely varying inner slope is preferred, despite penalizing against 
the extra degree of freedom. There is a strong indication that this inner slope needs to be determined 
for each cluster individually, some clusters have central cores and others have steep cusps. This implies 
that X-ray observations do not support the idea of a universal inner slope, but perhaps show a hint of 
a dependence with redshift or mass. The mass-concentration relation of our sample is in reasonable 
agreement with the latest predictions based on numerical simulations. 
Subject headings: dark matter — galaxies: clusters: general — X-rays: galaxies: clusters 



1. INTRODUCTION 

The potential of gravitationally bound structures in 
the Universe, ranging in size from dwarf galaxies to 
galaxy clusters, is sourced by a composite mass distri- 
bution of dark matter, baryonic matter in gas form, and 
collapsed objects such as stars in galaxies and galaxies 
in clusters. The investigation of these mass distributions 
entails a number of questions: what is the shape of the 
distributions? Is it universal across ten magnitudes of 
mass and at all redshifts? Does it depend on cosmology 
or on the merger history of the individual halos? Since 
the dominant component of relaxed structures is dark 
matter, much focus has been aimed at dark matter-only 
halos. 

There is little theoretical understanding of the distri- 
bution of matter in a dark matter halo. The main devel- 
opments have been found through numerical simulations 
of the formation of structure in the universe within a 
given cosmological model. Advances have been achieved 
through the improvement of numerical codes as well as 
the increase of raw computing power on one hand and a 
more refined understanding of which questions that need 
to be answered on the other. Perhaps the most funda- 
mental idea that has come out of the numerical approach 
is that relaxed halos are (nearly) universal in many re- 
spects, inc l uding the distribution of matter (Navarro 



et al. 1997 



Taylor & Navarro 200 1J) and the dyn amical 



structure ( frjullock et al.||2001af|Hansen fe Moore||2006 1 . 



However, the simulations have not been able to reach 
agreement about the exact behavior of the profiles in the 
innermost regions, where the limited force resolution of 
simulations sets a lower limit to the radial range that 
can be probed. Various authors claim that the logarith- 
mic slope of the density profile reaches a value between 
—1 and —1.5, perhaps dependent on mass or merger his- 
tory, and there is also discussion whether the inner slope 



is actually universal or not (|Moore et al. 1998 [ Klypin 
let al. |[200T] |Navarro et al ||20fJ4*| I Fukushige et a 



Merritt et ai.|2006||Graham et al |2006||Gao et al.|2008 l 



2004; 



A further complication arises when the simulations are 
compared with observations since the gravitational po- 
tential of the baryonic component, which is very time 
consuming to model in the simulations, cannot be ne- 
glected in the center. This complication can in principle 
both change the slope of the d ark matter profile as well 



1986 



as alter the total mass profile (f Blumcnthal et al 
El-Zant et aTl|200i~l [Ghedin et al.||2004| |Sommer : t! arson 
fe Limousin||2009p 



i'heoretical efforts are hampered by the fact that, even 
under the strongest simplifying assumptions, there are 
not enough constraints to obtain u nique solutions to the 



collisionless Boltzmann equation (Binney & Tremaine 



19871 which governs a dark matter structure. Instead 



one can take phcnomcnological input from numerical 
simulations such as t he density profile itself, the pseudo- 



phase space densit y (Taylor fe Navarro||2001 Dehnen & 



McLaughlin 20051, or the density slope-velocity a mso 
tropy relation (from which Hansen & Stadel ( 2006 1 pre 
diet an inner slope of 0.8), and implement this into a 
Jeans equation analysis to predict the conse quences of 
the 'inspired guess' (see also |Zait et al.| ( |2008[ ) and refer- 
ences therein). Alternatively one can attempt to model 
the formation history of the hal o including majo r merg - 
ers and steady accreti o n (e.g., Ryden fe Gunn| (Il987|); 
Ascasibar et a l.| (|2004|); |Salvador-Sole et al.| (I2007D; |Del] 



Popolo (2009), and references therein). While these ap- 
proaches typically yield results in rough agreement with 
simulations, the modeling can also explore the physical 
connection between the static and dynamic properties of 
the halo as well as offer constrained extrapolations which 
are not accessible in simulations. 

Observationally, there is a strong discrepancy between 
the numerical results and the inferred mass distributions 
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in dwarf and low surface brightness galaxies, which are 
much shallower th an predicted, the so- c alled cusp /core- 
problem (see, e.g. 



Salucci et a l. 
(|2005|); Gilmore et al.| (| 2007| )J 



( M)3| ); |Spekkens et al. 
At the opposite end of 
the mass spectrum, galaxy clusters are typically found 
to be in rough agreement with the cuspy numerical sim- 
ulations, but with even greater scatter for the inferred 
inner slope. There is also significant discussion about 
the type of model and number of parameters that are 
necessary in order to obtain an acceptable description of 
the data. One common method is based on mass model- 
ing through weak or strong gravitational lensing, which 
can yield results wh ich are in good agreement with nu- 
merical sim ulations (|Broadhurst et a l. 2005; Com erford| 



et al 
but 



et al. 



2006| |Limousin et al.l|2008| |Richard et~al.||2009[) 
so profiles that are significantly shallower QSand| 



2004 20081. Another method is based on X-ray 



observations of the intracluster medium (ICM) which is 
supported against gravitational collapse by its ow n pres- 
sure. Again, authors find a ran ge of inner slopes (j Ettori 
et al.|]20021 |Lewis et al.||2003| |Zappacosta et al.||200~ 
Saha & Read||2009). For both lensing and X-ray studies 



>ri 

6; 



most authors focus on only one or a few clusters, which 
of course makes it more difficult to assess the universality 
of the profiles on an observational foundation. 

In the present work we take a sample of 11 highly re- 
laxed clusters and use the measurements of the X-ray 
emitting gas to infer model-independent mass profiles. 
We then compare with a number of different models that 
have been applied as mass profiles in the literature, focus- 
ing on three key questions: Which parameterized model 
is the most successful? How many free parameters are 
needed to describe the data adequately? Is there evi- 
dence for a universal inner slope/shape- type parameter? 
We answer these questions using a detailed statistical 
analysis based on Bayesian inference where we use the 
Bayesian evidence (or marginal likelihood) to make judg- 
ments about which model is preferred by the data. 

2. DENSITY PROFILE MODELS 

Most models that are used for modeling the mass dis- 
tribution in halos have been proposed or introduced as 
fitting formulae applied to the halos found in the numeri- 
cal simulations. Hence these models are not theoretically 
well-founded but rather form a basis on which the pre- 
dictions of numerical simulations can be compared with 
observations. Almost all of these models have two free 
parameters which determine the mass scale and the spa- 
tial extent of the halo, and these two parameters are 
specific to each halo. Some models have one or more ad- 
ditional parameters which determines the shape of the 
profile, and which may or may not be universal. Here we 
consider a number of two- and three-parameter models. 

A whole class of models are 'double power-laws' which 
asymptote to power laws at very small and very large 
radii. These models can conveniently be summarized 
in Hernquis t 's (a,/3, 7) parametrization ( Hernquist|1990 
Zhaolp96l ), 



p(r) 



Po 



1 



(1) 



where po and r s are scaling constants to be determined 
for each halo individually. The inner power-law slope 



is a and the outer slope is (3, while the width of the 
transition region is controlled by 7. W e consider four 
such two-parameter profiles: the NFW (Na varro et al 



1997| ), the Dehnen-McLaughlin ( Dehnen fc M cLaughlin 
|2005| ), the Hernquist, and the Moore prohle ( |Moore et a 
|1998[ ). The properties of these models are summarized 
mTable[T] 

We also consider three three-parameter models: Two 
are simply generalized NFW profiles where, in the first 
case, we allow the inner slope a to vary. The motivation 
for this slopeNFW model was already apparent from the 
introduction. The second case, transNFW, is also a gen- 
eralization of the NFW where now the transition param- 
eter 7 is free. Such a profile can mimic a steeper inner 
slope by pushing the inner power law behavior closer to 
the cen ter. The th i rd profile is t he Sersic (or Einasto) 
profile ( |Sersic||1963| |Einasto||1969[ ), 

p(r)=p s exp(-2n (j^J -l^j, (2) 

where the parameter n determines the shape of the pro- 
file. For n — 4 the de Vaucouleurs' law describing the 
surface brightness of elliptical galaxies is recovered. The 
shape parameter is sometimes given as a s = n _1 . Re- 
cently, the Sersic profile has been claimed to provide 
a better fit than the NFW to Milky Way-sized haloes 
formed in numerical simulations, and, interestingly, with 
a sha pe parameter that varies significantly from ha lo to 
halo ( jSalvador-Sole et al ||2007| |Navarro et aTl2008 1 . 

We map the scale radius r s and scale densities p s or 
po of each model to the model-independent parameters 
r_ 2 and p-i, which are the radius at which the slope 
of the density profile is —2 and the density at that ra- 
dius, respectively. This mapping makes comparison of 
the models easier and enables identical priors to be used 
in the statistical analysis in all models. 

3. DATA ANALYSIS 

We revisit the sample of 11 highly relaxed, low redshift 
(z < 0.1) galaxy clusters observed with X MM-Newton 
which we already used in Host et al. (20091 to measure 
the dark matt er velocity amsotropy profil e for the first 
time (see also Hansen & Piffaretti ( 2007 1) . The mem- 
bers of this sample were selected to appear close to round 
on the sky and not have strong features in the temper- 
ature and density profiles. The spectral analysis and 
deprojection of t he X- ray data was carried ou t in |Kaas- 
tra et al.| ( |2004[ ) and |Piffaretti et~al] ( |2005[ ). The de- 
projection method was non-parametric, i.e. without any 
parametric modeling of the radial temperature or den- 
sity profiles. The outcome, and the starting point for 
the present analysis, was estimates of the ICM tempera- 
ture Ti and electron number density n e ^ with associated 
uncertainties in six or seven radial bins, for each of the 
clusters. 

Assuming hydrostatic equilibrium, the IC M gas traces 
the gravitational po tential according to ( jCavaliere fe; 
Fusco-Femiano||1978[ ) 

k B T ( dlnn e i dlnT\ GM tot (r) 



pm H 



d In r d In r 



where p = 0.6 is the mean molecular weight of the 
ICM. Almost all of the cluster mass resides in dark 
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TABLE 1 
Density profile models 



Model 


(a, P, 7) 


r~2/r s 


P-2/P0 


fi(x = r/r B ) 


NFW 


(1,3,1) 


1 


1 
4 


ln(l +x) - x/(l + x) 


D&M 


,7 31 4n 
^9' 9 ' 9^ 


121 
169 


0.0338 


»j(l +**/»)-« 


Hernquist 


(1,4, 1) 


1 

2 


16 
27 


x 2 /[2(l + x) 2 ] 


Moore 


(1,3,1) 


1 
2 


8 

3V3 


2sinh- 1 ( v ^) - 2 s /x/(l + x) 


slopeNFW 


(a, 3,1) 


2- a 


(2 - a)~ a (3 - a) a ~ 




transNFW 


(1,3,7) 


1 


1 

4 




Sersic 




1 


1 


8-™e 2n n 1 - 3n 7(3n, 2nx 1 /") 


Note. — 


Properties 


of the density profiles that we 


consider, including the (pt,fi, 7) 



specification, the relations between (r_ 2, p— 2) and (r s , po), and the shape fi(r) of the mass 
profile M(r) = 47rrf pop(r), if analytical. 7(11,2) is the lower incomplete gamma function, 
■y(a,x) = f*t a - 1 e-*dt. 



matter and the ICM, and therefore the dark matter 
mass distribution can be determined through A/dm(?") = 
M to t(r) — MicmM- The ICM mass profile is given 
straight-forwardly by the density picm = /imjin e . 

We calculate Mdm^i) of each radial bin through a 
Monte Carlo (MC) analysis in order to propagate uncer- 
tainties accurately. In detail, the prescription for each 
MC realization is as follows: In each bin i the best es- 
timates of Ti and added to random numbers 
drawn from Gaussian distributions representative of the 
uncertainties STi and Sn e j. In order to apply Equation 
([3j, we estimate the logarithmic derivative of, e.g., T 
at the bin-radius fj by the slope of the unique parabola 
that passes through (lnrj_i,lnTj_i), (In r j , In Tj ) , and 
(lnrj+i, lnTi+i). In this way we can calculate the total 
mass interior to r» for that data realization. We subtract 
the gas mass, estimated through a five-point Newton- 
Cotes integration formula applied to the same realization 
of the density data, and we arrive at the dark matter 
mass MuM,i- We impose a number of checks to deter- 
mine if the derived data realization is physically sensible: 
the ICM temperature and density must be greater than 
zero in all bins, the total mass profile must be increas- 
ing with radius, and the dark matter mass profile and 
derived density profile must also be everywhere positive. 
If these conditions are not met the entire realization is 
discarded. This process is repeated until N — 5000 re- 
alizations have been accepted. From these the sample 
mean of In M, in each bin is determined, as well as the 
sample covariance matrix with elements 



1 



1 



N 

~N ^ 



{\nM lk - QnMi))Qn.M jk - (lnM 3 )), 



. ( 4 ) 

where N is the number of Monte Carlo realizations. 
Even though we sample the ICM temperature and den- 
sity in each bin independently, the covariance matrix 
is not diagonal since the derivatives and physical con- 
sistency checks induce bin-to-bin correlations in the ac- 
cepted sample. We use the mean and covariance of In M 
rather than M since, by inspection, the former is closer 
to being Gaussian distributed. 

4. STATISTICAL ANALYSIS 

We take a Bayesian approach to the statistical analysis 
and the usual starting point is the likelihood function, 
which we calculate in the following manner. 



It requires less manipulation of the data to calculate 
the mass profile from the observations than to calcu- 
late the density profile. Therefore we integrate the den- 
sity profile analytically or numerically for each model 
to obtain the mass distribution and compare with the 
data in mass space, not density space. Further, as men- 
tioned above, we have found in the MC analysis that 
the mass samplings in each bin are close to being log- 
normally distributed. Therefore we construct the likeli- 
hood C{Mi) = exp(— x 2 /2) from the % 2 function, 



X 



In M( n ))Cr/ (In M, 



laMfa)), (5) 



hi 



where M(r^) is the model mass profile at the radial centre 
r,j of bin i, and In Mj and Cy are determined by the MC 
analysis. 

The main goal is to decide which model is the bet- 
ter representation of the data. We do this by calcu- 
lating the Bayesian evidence of each model, which is a 
quantitati ve measure of the agreement between model 
and data (Trotta 2008). First we calculate the likeli- 
hoods of each model on a grid in the parameter space 
9 = (log r_2, log p-2). Next, we construct the poste- 
rior probability distribution by combining the likelihood 
function with a prior probability distribution ir(0) resem- 
bling our knowledge of log r_2 and log p-2 before taking 
the data into account. We discuss the choice of prior 
below. We then integrate the posterior to obtain the 
Bayesian evidence, 



E= d07r(0)£(0,hiMi), 

Jail 



(6) 



which is essentially the weighted average of the likelihood 
over the prior volume. The evidence of a model, given 
the data and a prior, quantifies how well that model ex- 
plains the data. It is important to stress that the com- 
parison is made over all of the prior volume, not just at 
the best fitting set of parameters. When comparing mod- 
els the Bayes factor B12 = E1/E2 shows how much more 
(or less) probable model 1 is than model 2, in light of 
the data. Traditionally, this is gauged on Jeffrey's scale 
where a Bayes factor of In B 12 < 1 is labeled 'inconclu- 
sive' evidence for model 1 over model 2 while 'weak', 
'moderate', and 'strong' evidence corresponds to lni?i2 
values < 2.5, < 5, and > 5, respectively. 

We choose priors which are constant in the logarithms 
of r_2 and p-%. The flat logarithmic prior is the uninfor- 
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TABLE 2 

Bayes Factor InS for the two-parameter models, relative 

TO THE NFW PROFILE 



Cluster 


2 


D&M 


Hernq. 


Moore 


A262 


0.015 


-2.0 


0.9 


-3.0 


NGC533 


0.018 


-1.7 


1.2 


-3.0 


A496 


0.032 


-1.4 


0.6 


-1.2 


2A0335+096 


0.034 


0.5 


-0.1 


13.2 


A2052 


0.036 


1.9 


-0.3 


5.8 


MKW9 


0.040 


0.5 


-0.1 


1.4 


MKW3s 


0.046 


1.8 


-0.3 


6.2 


A4059 


0.047 


1.5 


-0.4 


9.5 


Sersic 159-3 


0.057 


-0.5 


1.6 


2.7 


A1795 


0.064 


2.5 


-0.5 


17.9 


A1837 


0.071 


0.5 


-0.2 


1.2 


Total 




3.6 


2.4 


51 



Note. — A positive value of In B indicates that the NFW profile 
is preferred over the considered model. Note that this does not 
imply any bias towards the NFW as the Bayes factor of any two 
other models is just the difference between the respective Bayes 
factors given here. 
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mative prior for scaling parameters ( |Trotta|2008 ) since i1 
reflects ignorance about the magnitude of the parameter. 
However we restrict the range of the priors, so that we 
end up with top-hat priors in logr_2 and logp_2- As a 
reference point we first assume a top-hat prior relative 
to the best estimate of r25oo as determined in the MC 
analysis. (The scale radius r25oo is defined as the radius 
within which the mean density is 2500 times the criti- 
cal density of the universe.) The top-hat prior in log r_2 
ranges from 1.5 magnitudes below r 2 5oo to 0.5 above. 
The basic idea behind this prior is that the transition or 
'roll' of a model should occur close to T2500j as it does in 
haloes in numerical simulations, and also to prevent the 
model from behaving as a simple power-law by pushing 
the transition from the inner to the outer power law far 
away from the range of the data. We emphasize that 
this is still a conservative prior, as current simulations 
typically resolve 2-3 radial magnitudes with r_2 located 
about one order of m agnitude below the virial radius 
(Bullock et al. 2001b I. The prior in p_ 2 is also a top- 
hat in the logarithm and a range of 10 _26 -10 -21 kgm~ 3 , 
which in practice means that the likelihood is vanishingly 
small at the boundaries of the prior. 

4.1. Two-parameter model results 

The result of the model comparison is summarized in 
Table [2] where the NFW model is compared against each 
of the other two-parameter models. A positive Bayes 
factor indicates that the NFW model is preferred. This 
does not imply any bias on the NFW since any two mod- 
els can be compared by subtracting the Bayes factors we 
give for them from one another. We find that, individ- 
ually, the clusters yield strong constraints only against 
the Moore model, while the evidences for or against the 
D&M and Hernquist models are either weak or incon- 
clusive on Jeffrey's scale. If instead we consider the cu- 
mulative Bayes factor summed over the full sample, the 
NFW is found to be the preferred model overall, i.e., as 
a universal two-parameter profile our sample favors the 
NFW model. The Hernquist profile and the D&M profile 
are weakly and moderately disfavored, respectively, with 
cumulative Bayes factors of 2.4 and 3.6 while the Moore 
profile is convincingly ruled out with a factor of 51. The 
weak constraint on the Hernquist profile is not surprising 



as data extending out to the virial radius would likely be 
needed to properly distinguish this model from the NFW. 

In Table [3] we present the effects of varying the pri- 
ors. The evidence against the D&M profile increases 
to the level of strong when we limit the range of the 
prior in logr_ 2 to the smaller interval (—0.75,0.25), 
while the Bayes factor is reduced slightly on the larger 
range (—3,3). The evidence also becomes strong if we 
choose top-hat priors in (r_2,/0_2) instead of the loga- 
rithmic priors. Finally, the D&M model is disfavored 
slightly more if we apply a 'soft' Gaussian prior in log r_2- 
The Bayes factor for the Hernquist model is robust un- 
der the same variations, while the Moore profile is very 
strongly ruled out in all cases. We conclude that our two- 
parameter model selection results are stable against vari- 
ation amongst reasonable choices of priors, which means 
that the data are of sufficient quality to make robust 
conclusions. 

A more interesting issue to consider than the priors is 
that the preference for the NFW profile over the Hern- 
quist and D&M profiles is somewhat susceptible to 'jack- 
knife' resampling: if we recompute the cumulative Bayes 
factor eleven times systematically leaving a single cluster 
out each time, there are a few cases where the strength 
of the evidence is reduced to inconclusive but also cases 
where it is increased to strong (against the D&M). This 
is largely due to the fact that our data sample is some- 
what inhomogeneous in terms of the relative statistical 
uncertainty on the mass profile. For example, a com- 
parison of the error bars of MKW9 with those of A1795 
or Sersic 159-3 (see Figure [T]) immediately shows that 
the former is much less constraining than the latter two. 
This means that our sample is a mixture of strongly and 
weakly constraining clusters and this is reflected in Fig- 
ure [2] where the contributions from individual clusters 
clearly varies. There appears to be a trend that the 
clusters A262, NGC533, and A496, which are the low- 
est redshift and some of the least massive in our sample, 
stand out by preferring the D&M and the Moore profile. 
However, such trends are just as likely spurious selection 
effects caused by the relatively small sample but could 
be investigated with a larger sample. The D&M profile 
can easily be preferred by clusters that also prefer the 
Moore profile since, by extending the transition region, 
the D&M profile can push the inner asymptotic power 
law well inside the radial range of the data. 

Finally we compare with a standard goodness-of-fit 
test: the minimum x 2 values for the models support our 
more detailed analysis: for a total of 53 degrees of free- 
dom we get minimum x 2 's of 81 for the NFW, 93 for the 
D&M, 82 for the Hernquist, and 190 for the Moore pro- 
file. Major contributions to these x 2 values come from 
the two clusters MKW3s with % 2 = 14.2 and A4059 with 
X 2 = 13.2 for the NFW model and similar or larger values 
for the other models. The corresponding p-values imply 
that the D&M x 2 is about 20 times less likely to have 
occurred by chance (if the D&M model is correct) than 
the NFW model is (if the NFW model is correct) . Com- 
pare this with the Bayesian odds that the NFW is ~ 40 
times more probable than the D&M. Note that the actual 
best-fits are slightly smaller since we evaluate the x 2 on 
a grid instead of minimizing it with a dedicated search. 
The x 2 values show that, also in terms of goodness-o-f- 
fit, our sample is rather inhomogeneous. The rather poor 
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Fig. 1. — Mass profile of each cluster with 68% uncertainties and best-fit models. The radial axis has been scaled to the best estimate 
of T2500 from the MC analysis, and the mass axis has been scaled by r~ 1 so that the fitted models are approximately horizontal at r_2- 



total fit should not be judged too harshly since the halos 
in numerical simulations also show halo-to-halo scatter, 
which is not accounted for by the fitting profiles. 

4.2. Three-parameter model results 

For the three-parameter models we again want to test 
whether the models represent the data better than the 
NFW. In this case the comparison is slightly more in- 
volved since there is the freedom of an additional param- 
eter to take into account. This naturally yields a lower 
value of the evidence if the extra parameter does not 
provide a better description of the data, or, more specif- 
ically, the third parameter must improve the fit over a 



significant volume of parameter space in order to be pre- 
ferred over the NFW. It is important to stress that there 
is no assumption about the third parameter being uni- 
versal. On the contrary, we ask whether the data require 
the additional freedom of an extra parameter which is 
determined individually for each cluster. 

The model comparison proceeds as before: we calcu- 
late the evidence for each of the three-parameter models 
with the same priors in logr_2 and logp_2 as in the 
fiducial two-parameter analysis for all models. For the 
slopeNFW we choose a top-hat prior for a which ranges 
from to 1.75, i.e. from a cored profile to a profile slightly 
steeper than the Moore profile. We do not want to go all 
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Fig. 2. — Bar chart of the Bayes factors \nB for the various models considered, relative to the NFW, as given in Table [2] and [4] The 
Bayes factors are additive so the contribution of individual clusters to the total Bayes factor is easily assessed. The values shown arebased 
on the fiducial priors discussed in the text. 



TABLE 3 

Total Bayes factor InS for the two-parameter models 

ASSUMING VARIOUS PRIORS, RELATIVE TO THE NFW PROFILE 



Prior 




Range log p_2 


D&M 


Hernq. 


Moore 


Top-hat in log 


r-2 


(-1.5,0.5) 


3.6 


2.4 


51 


Top-hat in log 


r-2 


(-3,3) 


2.8 


2.5 


36 


Top-hat in log 


r-2 


(-0.75,0.25) 


6.8 


2.3 


63 


Top-hat in (r_ 


-2.P-2) 


(-1.5,0.5) 


8.8 


1.3 


60 


Gaussian in lo 


gr_ 2 




4.8 


2.3 


50 



Note. — The top line is the fiducial prior used in Table[2] In the 
next two cases the range of the prior in logr_2 (in units of r2500> 
see text) is varied, and in the following two cases a top-hat prior 
in r_2 and both r_2 and p_2 is applied. The final case assumes a 
Gaussian prior in logr_2 with mean -0.25 and width 0.5. 



the way to —2 since r_2 tends to zero and eventually be- 
comes undefined as a approaches —2. For the transNFW, 
we choose a logarithmic prior with 7 in the range (0.1, 4) 
which allows this profile to mimic a steeper inner pro- 
file by pushing the asymptotic inner power law inside 
the radial range of the data. Finally, we take a logarith- 
mic prior for n in the range (2,15) for the Sersic model, 
motivated by numerical simulations which have best fits 
Sersic profiles with n=5-9. The logarithmic prior has 
the advantage that it is invariant whether one prefers n 
or a s = l/n as the parameterization. 

The resulting Bayes factors relative to the NFW are 
given in Table [4] and summarized in the chart in Fig- 
ure [2] The individual clusters provide only weak evi- 
dence for or against any of the models. Based on the 
whole sample, the model selection is inconclusive for the 
transNFW and Sersic models but there is 'moderate' ev- 
idence for the slopeNFW model over the NFW with a 
Bayes factor of —2.6. This corresponds to odds of 13 to 1 
in favor of the slopeNFW model and shows that, overall, 
the slopeNFW has the highest evidence E of all models 
considered. Hence the data show a moderate need for 
a free inner slope despite the penalty against the extra 
freedom built into the Bayesian analysis. It must be men- 
tioned that most of the discriminatory power is carried 
by a few clusters such as NGC533, A4059, and A1795 
and removal of any of these clusters from the sample 
would change the Bayes factor significantly. Therefore we 



TABLE 4 

Bayes Factor hiB for the three-parameter models, 
relative to the NFW profile. 



Cluster 



slopeNFW 



transNFW 



Sersic 



A262 

NGC533 

A496 

2A0335+096 

A2052 

MKW9 

MKW3s 

A4059 

Sersic 159-3 

A1795 

A1837 



-2.1 
-1.8 
-1.1 
1.1 
0.2 
0.2 
1.4 
-2.5 
0.3 
1.8 
-0.1 



-1.6 
-1.9 
-0.7 
0.6 
1.2 
0.4 
1.7 
-1.4 
0.9 
1.3 
0.2 



-2.0 
-1.8 
-0.9 
1.0 
0.6 
0.3 
1.5 
-1.8 
0.5 
2.4 
0.1 



Total 



-2.6 



0.7 



-0.1 



Note. — A positive value of In B indicates that the NFW profile 
is preferred over the considered model. A top-hat prior in logr_2 
of (—1.5,0.5) around the best estimate of r25oo f° r each cluster is 
assumed. 

caution that the moderate preference for the slopeNFW 
model is somewhat susceptible to selection effects since, 
as noted above, the constraints from individual clusters 
vary in quality. We also find some sensitivity to the 
choice of prior: if the upper bound of a is extended from 
1.75 up to 1.9, the Bayes factor for the slopeNFW model 
changes to —1.9, while if it is set to the Moore profile at 
1.5 the Bayes factor becomes —3.1. If the lower bound 
of a is increased to 0.5, the Bayes factor remains rela- 
tively unchanged at —2.9. While we believe that the fidu- 
cial priors used above are reasonable descriptions of the 
'state of knowledge' based on numerical simulations, the 
sensitivity to the choice of prior indicates that the data 
do not necessarily confine the posterior to a sufficiently 
small region of the prior volume to provide unambiguous 
conclusions. 

4.3. Constraints on the third parameters 

Finally, for the three-parameter models we also want 
to constrain the preferred value of the third parame- 
ter. Unlike above, this analysis assumes that there is 
a universal value for the third parameter and attempts 
to identify that value. We use the same priors as in the 
previous analysis for each model, but now we marginal- 
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Fig. 3. — Probability distributions for the third parameter in each of the three-parameter models: slopeNFW a (left), transNFW 7 
(middle), and Sersic n (right). In each panel, the full line shows the joint posterior for all clusters combined while the dot-dashed line shows 
the joint posterior obtained using the method of hyper-parameters (see text). The dashed lines are the pdf's of individual clusters. Note 
that each posterior is normalized to unity so it is not possible to draw conclusions about the quality of fit of the individual clusters from 
this plot. The standard 95% credible intervals are (1.00, 1.21) for a, (0.68, 1.06) for 7, and (4.3, 6.2) for n. With the hyper-parameters, the 
intervals are instead (0.85, 1.31) for o, (0.50, 1.28) for 7, and (3.5, 7.4) for n. We assume top-hat priors in a, In 7, and Inn. 



ize over the nuisance parameters (log r_2, log p^) to find 
the one-dimensional posterior probability distribution for 
the third parameter for each cluster. Then we combine 
the results from the individual clusters into a joint pos- 
terior which is simply the product of the the individual 
ones. We calculate 95% credible intervals for both the in- 
dividual and the joint posterior. However, we know from 
the previous analysis that each three-parameter model 
is preferred by some clusters but not by others. There- 
fore we also use the method of hyper-parameters ( |Lahav| 
et al. 2000| ) which allows the contribution from individ- 
ual data-sets to the joint posterior to be weighted. These 
weights are marginalized over assuming logarithmic pri- 
ors with the result that in the joint likelihood one replaces 



(7) 



where N% is the number of data points in data-set i. The 
upshot of all this is that clusters that are not described 
well by the model do not constrain the parameters as 
strongly as clusters that are well described. The price 
to pay is that the effective sample size is reduced which, 
all other things being equal, will lead to wider and more 
conservative credible intervals. 

The results are shown in Figure [3] where in each panel 
the fully drawn line is the joint posterior, the dotted line 
is the hyper-parameters posterior, and the dashed lines 
are the posteriors of the individual clusters. The gener- 
alized NFW models are drawn slightly away from, but 
not in disagreement with, the NFW with 95% credible 
intervals of (1.00, 1.21) for a and (0.68, 1.06) for 7. The 
interval for the Sersic n parameter is (4.3,6.2), in good 
agreement with the values reported by the Aqu arius nu- 
merical sim ulations for Milky Way-sized halos (Navarro 
2008|). The intervals derived using the method 

(0.85,1.31) 
The dif- 



ct al. 



of hyper-parameters are wider as expected: 
for a, (0.50,1.28) for 7, and (3.5,7.4) for n. 
ference between the hyper-parameters method and the 
conventional calculation illustrates the need for a cau- 
tious approach to in-homogeneous data-sets. We believe 
the hyper-parameters yields the more trustworthy results 
in the case at hand, while on the other hand we acknowl- 



edge that they are not very constraining. 

An inspection of the contribution from individual clus- 
ters reveals some issues: It is clear that for each model a 
number of clusters provide very little information about 
the third parameter, i.e. the model describes the mass 
profile almost equally well regardless of the third param- 
eter value. This is actually expected, given the varying 
size of the Bayes factors in table [4] There are also a 
few cases, particularly for the transNFW model, where 
the posterior peaks very close to or on the bounds of the 
prior. In such cases the results, e.g. the individual cred- 
ible intervals, are of course very prior-dependent which 
again indicates that the data are not very discriminatory 
with respect to the prior. On the other hand, rather dras- 
tic priors or small sub-samples must be used in order to 
significantly affect the credible intervals of the joint pos- 
terior, especially for the hyper-parameters method. 

Figure [3] shows the individual clusters' constraints on 
a, 7, and n. As could be expected given the varying 
nature of our results, there is perhaps the slightest of 
hints of a redshift-dependence in the constraints but our 
sample size does not allow us to probe such an issue 
in detail. It should again be noted that any hint of a 
redshift-dependence could actually be caused by a mass- 
dependence instead, since the two lowest redshift clusters 
in our sample are also the least massive. 

A different picture emerges when we consider the over- 
lap of the individual clusters' credible intervals for the 
slopeNFW model. For example, no value of a is con- 
tained in all 11 95% credible intervals, and only the very 
short range (1.08, 1.10) is contained in all but two inter- 
vals. Likewise the NFW a = 1 case is excluded from 
four of the eleven intervals. These results, as well as a 
visual inspection of Figure [3] puts strong doubts about 
the concept of a universal shape parameter. The situ- 
ation is not quite as compelling for the transNFW and 
Sersic models which is likely the reason that they do not 
stand out from the NFW in the model selection. In fact, 
we believe it is a reasonable statement that the success 
of the slopeNFW model is precisely due to the very dif- 
ferent preferred values of a from cluster to cluster. This 
puts a strong question mark against the idea of universal 
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Fig. 4. — The individual clusters' constraints on the third pa- 
rameter in each of the three-parameter models. In this case we 
show the 68% credible intervals, and the horizontal lines indicate 
the 68% range of the joint posterior calculated using the method 
of hyper-parameters. Refer to Table [2] for the redshifts of each 
cluster. 

third parameter. 

We conclude that there is moderate evidence for the 
slopeNFW model to be preferred over the simple NFW, 
while the transNFW and Sersic models do not stand 
out against the two-parameter NFW profile. If the in- 
ner slope of the slopeNFW model is universal, we con- 
strain it to be close to —1 but preferably slightly steeper. 
However, the spread of the individual clusters' preferred 
ranges suggests that the inner slope is not universal. 

We also comment that the method of hyper-parameters 
method could in principle be extended to the model se- 
lection analysis. As a matter of fact, since the slopeNFW 
'contains' both the NFW and the Moore models as sub- 
sets, we can derive the corresponding Bayes factor for 
the Moore profile which is only 10. This is of course a 
drastic reduction numerically but it does not alter the 
conclusion and anyway corresponds to rather convincing 
odds of about 20 000 : 1. 

5. BIASES 

So far we have discussed the interpretation of our re- 
sults with respect to the statistical evidence. However, 
a number of biases, or systematic uncertainties, can be 
thought of that may affect our results. Loosely, these 
can be grouped into biases that affect both the individ- 
ual cluster mass modeling and the combined analysis, 
and selection effects that only influence the latter. 

The analysis rests on the ability to produce deprojected 
temperature and density profiles with uncertainties that 
correctly mirror the uncertainties in the spectral analysis 
of the X-ray data. This has been discussed extensively 



in Kaastra et al. ( 2004 ) . The basic assumption in deter- 
mining the mass distribution of a galaxy cluster is that 
the cluster is relaxed, and obeys the equation of hydro- 
static equilibrium. Numerical simulations indicate that 
the additional pressure associated with turbulence and 
bulk motion in the ICM yields an underestimate of the 
mass in the region of 5 — 20% with the large r values cor- 
responding to large radii, r 50 p and greater (Nagai et al. 
[2^0^[F^retti fc Valdarnini|[2^08l|Lau et al.||2009| . We 
do not expect this bias to exceed 10% in the present case 
since we do not model further out than to ~ ^2500- On 
the other hand, the same numerical simulations indicate 
that if the turbulent pressure is accounted for, an accu- 
rate mass reconstruction is possible. This point demon- 
strates that deviations from spherical symmetry are not 
a major concern in the error budget. 

A related question is whether the parameterized pro- 
files should be tested against the total mass distribu- 
tion or the dark matter mass profile only. While the 
predictions of numerical simulations are founded in dark 
matter-only simulations, it is not clear how much a sim- 
ulated dark matter-only mass profile is modified by the 
presence of baryons. Observationally, the ICM con- 
tributes about 5 — 15% of the total density in a clus- 
ter, again increasing with radius in the range of inter- 
est here, so formally there is a difference between the 
total and the dark matter profile's radial dependence. 
To test the impact of this, we have rerun the statistical 
analyses described above without subtracting the ICM 
mass from the mass estimate of Equation pj) . We find 
only minor differences: For the two-parameter models, 
the total Bayes factors relative to the NFW profile, as- 
suming the fiducial prior as in Table [2] are 2.8 (D&M), 
2.7 (Hernquist), and 54 (Moore), i.e. there is no signifi- 
cant change in the interpretation of the results. For the 
three-parameter models, the total Bayes factors become 
-3.1 (slopeNFW), 0.6 (transNFW), and -0.2 (Sersic), 
which are in good agreement with the results in Table 4] 
Finally, the constraints on the third parameters for the 
three-parameter models are unchanged. 

The fact that our results are stable whether we test 
the mass models against the total or dark matter-only 
mass profiles allows us to gauge how important the mass 
bias caused by turbulent pressure is. The point is that 
the turbulent pressure is expected to contribute the same 
amount (or less) to the total mass estimate as the ICM 
mass: both contributions are at the 5— 15% level and ra- 
dially increasing, and at the maximum radius we consider 
~ r 2500 the gas fraction (~ 10%) is likely larger than the 
pressure bias. Since our results are the same whether we 
account for the ICM mass or not, we conclude that the 
systematic uncertainty is likely much smaller than the 
statistical uncertainty. 

6. MASS-CONCENTRATION RELATION 

An important consequence of the 'bottom-up' scenario 
of structure formation is that smaller halos are denser 
in the center, since they formed earlier when the density 
of the Universe was higher. This effect is seen in nu- 
merical simulations and it can be expressed as a relation 
between the halo mass and the concentration parame- 
ter. The concentration parameter is defined for a given 
overdensity as ca = ^a/V— 2 (often r s is used instead of 
r_2 but for the NFW this is unimportant). Simulations 
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Fig. 5. — The mass-concentration relation of our sample, cal- 
culated within the NFW model. The contours contain 95% of 
the posterior PDF and are based on the fiducial prior. We show 
two contours for each cluster: (M2500, C2500) (red, full lines) which 
are derived within the radial range of the data, and (M20O1C200) 
(blue, dot-dashed) which is based on an NFW model-dependent 
extrapolation to T2oo- The dashed lines show the mean r elations 
for th e two values of A from the N-body simulations of Maccio 
|et al.| ( |200S| ) , based on the WMAP5 cosmology. The relations 
are loge 20 o = 0.83 - 0.094 log(M 2O o/lO 12 M ) and loge 25 oo = 
0.35 - 0.130 log(M 2 5oo/lO 12 M ). Given the low redshift of our 
sample, we have not made any correction for a redshift evolution 
of ca- 

usually consider the mass-concentration relation at the 
virial radius T2ooj but we can only reach that radius by 
model-dependent extrapolation. Therefore, in Figure [5j 
we show the mass-concentration relation of our sample 
calculated within the NFW model at both r25oo and ex- 
trapolated to r 2 oo- We suggest that authors provide rela- 
tions at both of these A's as they complement each other 
in physical significance and observational accessibility. 

As can be seen in Figure [5] our sample is not ide- 
ally suited to derive a relation from given that six sam- 
ple members cluster at almost identical values of Ma- 
Instead we compare with the mas s-concentration rela - 
tion of the simulations presented in Maccio et al. ( 2008 1 , 
which are in reasonable agreement with our sample ex- 
cept for the low mass NGC533. We emphasize that the 
orientation of the uncertainty ellipses is related only to 
the parameter degeneracies present in the combination 
of model and mass profile data and has nothing to do 
with the slope of the mass-concentration relation. The 
agreement between our observed mass-concentration re- 
lation and the predictions of nume rical simulations re - 
scmbles the recent X-ray analysis of Buote et al. ( 2007 1 , 
but stands out from the significant dis crepancy of the 
lensing study in Broadhurst et al. ( 2008 1 . 



7. SUMMARY & DISCUSSION 

We have conducted a careful statistical analysis of the 
constraints on mass distribution models of galaxy clus- 
ters which can be derived from X-ray observations. We 
find that the NFW model is the preferred two-parameter 
model and that the Moore model is decisively ruled out. 
There is moderate evidence that the data require an ad- 
ditional free parameter that alters the shape of the mass 
profile, and the best choice is a model similar to the NFW 
but with a freely varying inner slope. If we assume this 
slope to be universal, we can constrain it to be close to or 



slightly steeper than the NFW, but our data suggest that 
the shape parameter must be determined individually. 

Significantly, the clusters in our sample prefer very dif- 
ferent values for the inner slope, some prefer flat cores 
while others prefer steep cusps. The shape-parameters 
of the other two three-parameter models we consider, 
the Sersic and transNFW, also show considerable scatter 
across our sample. We conclude that there is a strong in- 
dication in our data that the mass profile is not universal 
but suffers considerable halo-to-halo scatter. The limited 
size of our sample means that we cannot assess whether 
this is in disagreement with the results of numerical simu- 
lations. Of course, we can force universality of the inner 
slope, in which case we find that it is preferred to be 
slightly steeper than the NFW value of —1. However, 
when the goodness-of-fit of each cluster is taken into ac- 
count using the method of Bayesian hyper-parameters, 
the credible interval becomes significantly larger, partly 
because of the smaller effective sample size, but also be- 
cause of the lack of universality. 

This analysis stands out from the numerous observa- 
tional results that claim significant discrepancies from 
simulations based on only one or a few observed clusters. 
We acknowledge that our sample size is still limited, but 
it allows us to discuss the issue of universality. Given that 
halos in numerical simulations which include baryons are 
still not readily mass produced with sufficient resolution, 
which makes the question of halo to halo scatter difficult 
to assess, it is not possible to decide if the strong indica- 
tion of a non-universal model that we see is at odds with 
the numerical predictions. 

Our results are largely insensitive to whether we com- 
pare the models with the dark matter mass profile or 
the total mass profile, and so we cannot judge whether 
one type of model is more appropriate for the dark mat- 
ter halo or the total gravitational potential of a halo. 
There are two reasons for this: firstly, the uncertainties 
of primarily the ICM temperature profiles are too large, 
and secondly, the angular resolution in the center is not 
good enough. Of course the ICM is rather smoothly dis- 
tributed and very good statistics would be needed for a 
model to fit either the total or dark matter distribution 
significantly better. For nine of the 11 clusters of our 



sample we readily found 2MASS ( Skrutskie et al. 2006 1 
cD or BCG galaxies very close to the X-ray center but 
including these in the mass budget does not make a dif- 
ference, unless we assumed extreme mass-to-light ratios. 
This is again due to the limited angular resolution of the 
observations which implies that a large amount of dark 
matter is contained even within the radial center of the 
innermost bin. 

The robustness of our results whether we use the dark 
matter or total mass profile gives us reasonable confi- 
dence that deviations from hydrostatic equilibrium in the 
ICM are not a major problem. Such a systematic uncer- 
tainty would yield a bias of at most 10-15% according to 
numerical simulations, which is similar to the difference 
between the total and dark matter mass profiles. 
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